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Summary 

Given  that  a  distribution  function  is  a  member  of  a  subclass 
of  absolutely  continuous  measures,  we  consider  the  problem  of  non- 
parametric  estimation,  with  the  method  of  maximum  likelihood,  of  the 
underlying  density  function  of  a  given  sample  of  independent  identically 
distributed  random  variables.  Sufficient  conditions  on  the  space  of 
probability  densities  and  its  topology  are  given  for  the  consistency  of 
such  an  estimate. 


a 
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1.  A  Topology  of  Pointwise  Convergence.  Let  our  random  variables  take 

values  in  some  o-finite  measure  space  (fi,2l,p).  Hereafter  we  refer  to 

the  Radon-Rikodym  derivative  of  a  measure  with  respect  to  p  as  its 

density.  By  a  probability  measure  (denoted  F,  G,  H  with  or  without 

subscripts)  we  mean  a  nonnegative  Borel  measure  on  (fl,:B)  with  F(Ci)  <  1. 

(Note  that  F(fl)  may  be  <  1.)  We  shall  consider  a  subclass  6  of 

densities  (denoted  f,  g,  h,  f  =  dF/dp,  etc.)  of  absolutely  continuous 

probability  measures,  and  give  £  a  topology  of  pointwise  convergence. 

Specifically,  given  an  element  fe£,  we  assume  there  exists  a  fixed  set 

eH,  called  the  exceptional  set  of  f,  such  that  p(B^.)  =  0  and  a  net 

[f  .  a  tA}  in  £  converges  to  f  whenever 
a 

Lim  B  =  U  0  B„  c  B 

and  a  aeA  P 

Lim  f  (x)  =  f(x)  all  x  not  in  B„. 
a  a  1 

It  is  shown  that  the  collection  of  all  pairs  (£7,f),  where 

3=  {f  }  is  a  net  converging  to  f,  both  f  and  f  elements  of  £, 
oc  a 

satisfy  the  conditions  for  a  convergence  class  (c.f.  Section  4).  Hence 
there  is  a  unique  topology  on  £  such  that  we  have  precisely  the  convergence 
indicated.  (The  collection  of  pairs  (^,f)  will  define  a  closure  operator, 
which  in  turn  defines  a  topology.) 

To  exemplify  the  idea  of  exceptional  sets ,  we  mention  briefly  an 
application.  Let  (fi,2),p)  be  the  right  half  line  [0,+“>)  ,  the  Borel  field, 
and  Lebesgue  measure.  Let  £  be  the  class  of  densities  which  are  non¬ 
increasing  on  fi,  with' total  mass  one.  Then  any  element  of  £  has  at  most 
countably  many  discontinuities.  Given  an  element  fe£,  let  the  exceptional 
set  of  f  be  the  collection  of  its  discontinuities.  Then  our  topology  on 
£  is  precisely  the  same  as  the  topology  of  convergence  at  points  of 
continuity. 
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2.  Nonparametric  Maximum  Likelihood  Estimation.  We  shall  be  concerned 
with  estimating  by  the  method  of  maximum  likelihood  the  underlying 
probability  density  f  given  a  sample  xn  of  size  n  of  independent 
identically  distributed  random  variables  with  observations  xi tx2 »• ♦ • >xn» 
and  given  that  f  is  a  member  of  some  class  £■. 

That  is,  given  a  sample  xn,  we  define  the  "likelihood"  functional 
L  on  C : 

n 

L(g,x  )  =  Z  log  g(x, )  , 
n  i=l  1 

and  for  a  fixed  sample  x^ ,  we  define  our  "maximum  likelihood  estimate" 
of  the  density  of  to  be  a  point  (if  one  exists)  of  £  which 

maximizes  L( • ,x^) . 

It  is  our  intent  to  show  that  with  certain  restrictions  on  the  class 
of  distributions  and  its  topology  the  maximum  likelihood  estimate  is 
consistent ,  that  is ,  as  our  sample  size  gets  large ,  the  estimate  of  the 
density  almost  surely  converges  to  the  true  density. 

Often  our  estimate  f&  will  not  be  a  proper  member  of  £,  but  instead 
the  density  f  will  be  a  limit  of  measures  in  & .  Thus  we  are  led  to 
consider  a  compactification  S.  of  &. 

We  assume  the  elements  of  ■£  are  densities  of  probability  measures , 
and  corresponding  to  every  element  in  ■£  there  is  an  exceptional  set  of 
p  measure  zero  such  that  f^  converges  to  f  if  and  only  if  f^Cx) 
converges  to  f(x)  for  all  x  not  in  the  exceptional  set  and 
Llm  Bp  c  Bj. 
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Since  the  correspondence  between  measures  and  densities  is  not  one 
to  one,  we  find  it  convenient  to  work  in  quotient  spaces  6/ft ,  X/R f  where 
R  is  the  equivalence  relation 

f R  g  in  case  f(x)  =  g(x) 

except  on  a  set  of  p  measure  zero.  (c.f.  Section  5.) 

The  added  complexity  of  considering  the  quotient  space  with  the 
quotient  topology  is  bothersome  h  unavoidable.  If  6  and  X  are  such 
that  (as  is  the  case  in  the  classic .  .j  k-dimensional  cases)  f,  g  in  £ 
and  fRg  if  and  only  if  f  =  g ,  then  the  quotient  spaces  6/R  and 
X/ft  are  precisely  the  same  as  £  and  X. 

Classically  the  method  of  maximum  likelihood  has  been  restricted  to 
families  of  distribution  functions  having  some  k  dimensional  parameter¬ 
ization.  The  distributions  are  then  given  the  metric  topology  induced  by 
the  usual  metric  on  the  parameter  space  and  the  estimates  are  shown  to  be 
consistent  subject  to  certain  regularity  conditions. 

Consistency,  however,  is  essentially  a  topological  property,  and  it 
is  unduly  restrictive  to  establish  it  only  on  the  basis  of  a  parameter¬ 
ization. 

The  problem  of  estimating  a  measure  instead  of  its  density  is  in 
many  ways  more  appealing.  However,  with  the  method  of  maximum  likelihood 

e 

it  is  the  density  which  is  important  in  picking  an  estimate.  The  trans¬ 
formation  between  X/R  and  the  corresponding  class  of  probability  measures 


is,  of  course,  one  to  one,  but  the  corresponding  estimate  of  the  measure 
will  in  general  be  consistent  if  and  only  if  the  transformation  from 
densities  to  measures  is  continuous.  That  is,  our  corresponding  estimate 
of  the  measure  will  be  consistent  when  we  give  the  measures  a  topology 
which  is  contained  in  the  topology  induced  by  the  transformation  from 
elements  of  >£/R  to  the  corresponding  measures.  It  is  interesting  to 
note  (c.f.  Section  6)  that  in  many  cases  the  topology  induced  on  the  corre¬ 
sponding  measures  is  precisely  the  topology  of  convergence  in  distribution. 
For  instance,  this  is  the  case  if  we  take  6  to  be  the  class  of  all  uni- 
modal  densities  uniformly  bounded  by  M ,  (c.f.  Section  3)  or  if  we  take  £ 
to  be  the  class  of  densities  with  increasing  hazard  rates,  (c.f.  [4].) 

Note  that  in  the  classical  cases  of  k  dimensional  parameterization 
there  is  a  continuous  one  to  one  transformation  from  the  parameter  space 
to  the  densities  (with  our  topology)  and  similarly  to  the  class  of  measures 
with  the  topology  of  convergence  in  distribution.  Since  Euclidean  k-space 
has  the  property  of  invariance  of  domain  it  follows  that  these  transforma¬ 
tions  are  bicontinuous ,  and  hence  the  transformation  from  densities  to 
measures  is  continuous.  Moreover,  it  is  clear  that  the  classical  classes 
of  densities  are  locally  compact  and  locally  separable,  since  they  are  • 
homeomorphic  to  a  subspace  of  k-space. 

Following  A.  Wald  (c.f.  [5]),  we  intend  to  give  a  proof  of  consistency 
assuming  that  we  have  a  locally  separable,  locally  compact  quotient  space 
6/R  of  probability  densities ,  which  together  with  a  suitable  compactifi- 
cation  £/R  satisfies  Conditions  1-3.  By  locally  separable,  we  mean 
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the  neighborhood  system  at  a  point  has  a  countable  base. 

Section  6  is  a  topological  discussion  which  is  intended  to  make  these 
hypotheses  more  amenable. 

Condition  1.  £/R  with  the  above  mentioned  topology  of  pointwise 

convergence,  is  a  locally  separable,  locally  compact  Hausdorff  space. 

We  assume  that  there  exists  some  Hausdorff  compactification  •£  of 
£  such  that  and  £/R  satisfy  the  following  conditions. 

The  following  is  of  paramount  importance  to  the  method  of  maximum 
likelihood  and  is  known  to  be  true  under  a  wide  range  of.  conditions.  Let 
P  be  the  projection  of  £  into  £/R.  (c.f.  Theorem  6.1,  pages  5i  6,  14  of  [33. ) 

Condition  2.  If  P  g  e  f/R  ,  P  f  e  £/ft  ,  are  distinct  elements  of  £/R, 

then 

eF  log  g(x)  <  &F  log  f(x) 

and 

_  od  <  6  log  f(x)  <  +  00, 

£ 

(Here  £  denotes  integration  over  H  with  respect  to  F.) 
jy 

For  a  subset  B  of  £/R  let 

s(x,B)  =  sup  f(x) . 

PfeB 

Condition  3.  If  PgeX/R,Pf  «£/R  are  distinct  points  of  £/R,  then 
for  sufficiently  small  neighborhoods  B  of  Pg,the  function 


log  s( •  ,B) 
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is  measurable  and  bounded  above  by  some  function  which  is  integrable 
with  respect  to  F, 

Lemma  1.  Let  [u^}  be  a  decreasing  sequence  of  neighborhoods  of 
P  g  in  such  that  DU  =  {p  g} . 

Then 

Lim_  E  log  s(x,U  )  =  E_,  log  g(x)  for  any  F  in  6. 
n  j?  n  £ 

Proof.  In  view  of  Condition  2  we  have  only  to  show  that 
(1.1)  Limn  s(x,Un)  =  g(x)  a.s.  F; 

for  the  asserted  result  then  follows  from  the  fact  that  s(x,U  )  is 
decreasing  and  the  bounded  convergence  theorem.  To  show  (1.1)  we  begin 
by  using  Condition  1  and  throwing  out  a  set  B  of  y.  measure  zero  (and 
hence  of  F  measure  0)  so  that  on  ft  ~  B,  gn  converging  to  g  implies 
gfl(x)  converges  to  g(x).  Now,  for  fixed  x  not  in  B  and  any  e  we 
can  pick  g  in  U  so  that  s(x,U  )  -  g  (x)  <  e.  It  follows  that 

Lim  s(x,Un)  -  g^(x)  =  Lim  s(x,Un)  -  g(x)  <  e,  for  any  e,  thus 
s(x,Un)  l  g(x)  ,  as  was  to  be  shown. 

Given  a  sample  x^,...,xn  of  independent  random  variables  with 
density  f,  let  f  be  a  point  in  ■£  which  maximizes  the  likelihood 
functional  L( • jX^) .  We  say  that  the  sequence  of  estimates  {fQ)  is 
consistent  in  case  P  f  converges  to  P  f  in  the  topology  of  X/R  with 
F  probability  one.  Note  that  this  is  in  general  stronger  than  saying 
that  converges  almost  surely  to  f(x)  except  on  a  set  of  y. 


measure  0. 
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It  is  not  obvious  that  the  event  {p  f  converges  to  P  f  in  the 
topology  of  -C/R}  is  always  measurable.  It  is  our  indent  to  show  that 
the  complement  of  this  event  is  a  subset  of  a  measurable  set  of  F 
measure  zero,  and  hence  measurable.  Similarly,  in  the  sequel,  measur¬ 
ability  is  implicit  in  the  statements  that  events  have  F  measure  one. 

The  essence  of  the  fact  that  maximum  likelihood  estimates  are  con¬ 
sistent  is  contained  in  the  following  theorem. 

Theorem  1.  Let  P  g  in  £/R  and  P  f  in  £/R  be  two  distinct 
elements  of.  -C/R  and  let  be  a  sample  with  density  function  f. 

Then  for  sufficiently  small  neighborhoods  U  of  Pg: 

Lim  supn[supph€tJL(h,Xn)  -  L(f',Xn)]  <  0 

with  F  probability  1. 

Proof.  Given  Pg  not  equal  to  P f  there  exists  (by  Condition  2  and 
Lemma  1)  a  small  neighborhood  0  of  P  g  such  that 

(1.2)  Ef  iog  s(X,U)  <  Ey  log  f (X)  . 

Now 

supheU  n  2  log  h(Xi^  1  ^  2  loS  s(X±,U). 

Taking  limits  we  see  by  the  strong  law  of  large  numbers  that  with  F 
probability  one 

i  £  log  s(x^  ,U)  ->  Ef  log  s(X,U) 

and 

i  £  log  f (X. )  -  E  log  f (X) 
hit 

and  the  result  follows  by  (1.2). 
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Corollary  to  Theorem  1.  Let  and  Pf  be  as  in  Theorem  1. 

Let  D  be  any  closed  set  not  containing  P  f .  Then 

Lim  supn[suppheDL(h,Xn)  -  L(f,Xn>]  <  0 
with  F  probability  1. 

Proof.  By  Theorem  1,  any  P g  in  D  can  be  covered  by  an  open  neighbor¬ 
hood  U  with  the  property  that 
& 

.  Lim  supn[supheU  L(h,Xn)  -  L(f ,Xq)]  <  0. 
g 

From  the  open  cover  {u  ,g e  d}  of  D,  let  U. be  a  finite  sab- 

S  -L  m 

cover.  Then  with  F  probability  1, 

Lim  supn[supph£DL(h,Xn)  -  L(f,Xn)]  < 

Lim  sup^f  max  [suPpheDL(h  ,Xn)  -  L(  f  .X^)  ]  }  =  max  Lim  suPnCsuPpheU  L(h  ,X  )  -  L(-f  ,X  )  ]  <  0 

l<i<m  l<i<m  ui 

w  —  —  — 

as  was  to  be  shown. 

Theorem  2.  (Consistency  of  Maximum  Likelihood  Estimate)  Let  f(';Xn) 
be  a  point  of  £  depending  on  the  random  variable  X^,...,Xn  =  X^  such  that 
n 

it  f(X.  ;X  ) 

1  1  n 

-  >  c  where  c  >  0. 

n  —  ■ 

it  f(X.  ) 

1  1 

Then  PfC^X^)  converges  to  P  f  in  the  topology  of  £/ft  with  F  probability 
one. 

Proof.  For  notational  simplicity,  write  f  (•)  =  f(»;Xn)  then  if 


o 
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n  f  (x. ) 
,  n  i 


>  c  >  0, 


ti  f(x  ) 
1  1 


then 

(2.1) 

and  therefore 


Lim  sup 


P1  wl 


l/n 


>  1, 


Lim  sup  [L(f  ,x  )  -  L(f,x  )]  >  0. 
n  n  n  n  — 


Let  U  be  any  open  neighborhood  of  P  f .  If  P  f  is  outside  of  U 
infinitely  often,  then 

Lim  supn[suppheJ>uL(h,xn)  -  L(f,xn>]  >  0  .  * 

By  the  corollary  to  Theorem  1 ,  this  can  occur  only  on  a  set  of  measure 
zero.  Now,  let  {u^}  be  a  decreasing  sequence  of  neighborhoods  of  Pf 
whose  intersection  is  P  f .  Corresponding  to  each  1L  there  is  an  event 
S.  of  F  measure  zero  such  that  on  the  complement  of  S. ,  Pf  is 

co  ** 

eventually  in  U. .  It  follows  that  on  the  complement  of  D  S. ,  P  f  is 

i=l  1  n 

eventually  in  every  neighborhood  of  P  f .  Thus  PfR  converges  to  P  f  in 
the  topology  of  £/R  with  F  probability  1,  as  was  to  be  shown. 

3.  Application  to  Estimation  of  a  Onimodal  Density.  Let  our  sample 
space  be  the  real  line  with  the  usual  Borel  field  and  Lebesgue  measure. 
Take  &  to  be  the  class  of  unimodal  densities  uniformly  bounded  by  some 
constant  M  and  such  that 

CO 

f  f(x)dx  =  1. 


11 


That  is,  the  class  of  densities  for  which  there  exists  some  x 
such  that  f(x)  <  M,  f  is  nondecreasing  on  (-°°,x],  and  nonincreasing 
on  [x,+<=°).  Any  such  x  is  called  a  mode  of  f.  Note  that  if  x  and 
y  are  modes  of  f,  then  f(x)  =  f(y).  If  x  is  a  mode  of  f,  we  define 
the  height  of  f  to  be  the  value  f(x). 

Any  element  of  £  has  at  most  countably  many  points  of  discontinuity. 
Give  £  the  topology  of  pointwise  convergence  on  points  of  continuity. 

That  is ,  given  an  element  f  in  £ ,  we  take  the  exceptional  set  to 

be  the  points  of  discontinuity  of  f. 

Add  to  £  all  densities  of  the  form  p  f ,  0  <  p  <  1,  and  f  in  £ , 
and  give  £  the  topology  of  convergence  on  points  of  continuity.  Again 
in  this  case  the  exceptional  set  of  p  f  will  consist  of  the  points  of 
discontinuity  of  f. 

Note  that  the  class  of  unimodal  densities  which  are  finite  linear 
combinations  of  characteristic  functions  is  dense  in  •£.  Moreover, -the 
characteristic  functions  may  be  taken  to  be  1  on  intervals  with  rational 
end  points,  and  the  multiplicative  coefficients  may  be  taken  to  be  rational 
Thus  unimodal  densities  of  this  type  form  a  countable  dense  subset  of  ■£. 
Let  C  denote  this  countable  dense  subset  of  •£. 

We  will  now  construct  a  base  for  the  neighborhood  system  at  a  point 
f  of  <£.  Let  D  be  a  countable  dense  subset  of  the  points  of  continuity 
of  f.  For  every  positive  integer  n  and  every  finite  subset  .  ,(1^ 

of  D,  we  construct  a  neighborhood  U  of  f  as  follows. 

U  =  {g  :  i)  lg(di>  -  f(di) |  <  1/n  for  i  =  1 , . . .  ,m 

ii)  there  is  a  mode  of  g  and  a  mode  of  f  such  that 

|m1  -  m2l  <  1/n 
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iii)  If  f  has  a  mode  which  is  a  point  of  continuity,  then 
| height  of  g  -  height  of  f|  <  l/n,  otherwise  there  is 
no  restraint  on  the  height  of  g.} 

It  follows  that  ■£  has  a  countable  dense  subset  and  is  locally 
separable,  thus  there  exists  countable  base  for  the  topology  of  £. 

Hence,  it  suffices  to  verify  that  £  is  sequentially  compact. 

Suppose  {f  }  is  a  sequence  in  £.  Let  m^  be  a  mode  of  f^. 

Then,  if  iim  sup  m  =  +=°,  there  is  a  subsequence  of  {f  }  which 
’  n  n 

converges  to  the  density  z(*)  which  is  identically  zero.  Similarly 
if  lim  inf m  =  there  is  a  subsequence  of  { f n 3  convering  to  z. 

If,  on  the  other  hand,  -®  <  a  <  mn  <  b  <  +“>  for- all  large  n,  then 
pick  a  subsequence  such  that  mn  converges  to  some  finite  value  m.  Now, 
we  have  a  sequence  of  uniformly  bounded  functions  which  are  nondecreasing 
on  m^] ,  and  m^  converges  to  m. '  Hence  by  considerations  similar 

to  the  Helly  Weak  Compactness  theorem  we  may  pick  a  subsequence  which  is 
convergent  on  (-«,  m] .  Similarly,  from  this  we  may  pick  a  subsequence 
which  is  convergent  on  Cm,  +”)  ,  and  this  sequence  is  convergent  in  •£ , 
as  was  to  be  shown. 

It  can  be  seen  that  s(x,U)  is  in  fact  equal  to  the  supremum  of 
g(x)  where  g  is  in  the  intersection  of  U  and  the  countable  dense 
set  C.  Thus  s(x,U)  is  the  supremum  of  countably  manyt,  measurable  functions 
and  as  such  is  measurable.  The  function  log  s(x,U)  is  F  integrable 
since  it  is  bounded  by  log  M. 
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For  Condition  2 ,  recall  that  f  e  6  implies  that 

OO 

X  f(x)dx  =  1, 

.00 

and  note  Theorem  6.1. 

Thus  the  class  of  bounded  unimodal  densities  satisfies  all  the 
regularity  conditions ,  and  it  follows  that  the  maximum  likelihood  estimate 
of  the  density,  given  a  sample  of  observations,  is  consistent. 

R.  Pyke  (in  a  private  conversation)  has  suggested  an  algorithm  for 
computing  this  estimate.  This  and  other  applications  are  to  appear  in 
a  forthcoming  paper. 

In  another  paper  [4]  to  appear,  A.  W.  Marshall  and  Frank  Proschan 
consider  the  maximum  likelihood  estimate  of  a  distribution  with  monotone 
hazard  rate.  If  we  take  0  to  be  the  half  line  [0,+°°) ,  these  distribu¬ 
tions  satisfy  our  requirements  for  consistency  of  the  estimate.  However, 
they  consider  Ci  to  be  the  whole  real  line  and  give  a  direct  method  of 
proving  consistency.  An  easy  algorithm  is  given  for  computing  this 
estimate. 

4.  Appendix,  Convergence  Classes,  Let  I  be  a  space  of  densities  of 

probability  measures  on  (n,3,p),  and  assume  that  with  each  feX  we 

associate  a  set  B ^  e3  of  p.  measure  zero.  Then  form  the  class  C  of 

pairs  (^,f)  ,  where  f  eX  and  £7  =  {fa,ae  a)  is  a  net  in  X  such  that 

f  (x)  converges  to  f(x)  for  all  x  not  in  B  and  Lim  B„  c:  B„.  Then, 

a  £  - a  f  x  7 

a 

in  the  notation  of  Kelley,  [2],  we  have: 


14 


Theorem  4,1  6  is  a  convergence  class. 

Proof  First  note  that  if  f  =  f  for  all  ae  A,  then  f  ,  ae  A 
-  a  ’a 

converges  G  to  f. 

Second,  if  f  converges  G  to  f,  then  off  B„  every  subset  of 
a  i 

f  (x)  converges  to  f(x)  and  hence  every  subset  of  [f  ,  as  A]  converges 
a  ot 

to  f . 

Third,  if  f  does  not  converge  G  to  f,  then  there  is  some  point 

x  not  in  f  such  that  f^(x)  does  not  converge  to  f(x).  Hence,  there 

is  a  subnet  {f  (x) ,  asC],  no  subnet  of  which  converges  to  f(x).  Thus 
a 

no  subnet  of  {f  ,  aecj  converges  to  f. 
a 

Finally,  we  must  show  that  G  satisfies  the  theorem  bn  iterated 
limits.  We  can  do  no  better  here  than  to  remind  the  reader  of  our  defini¬ 
tion  of  convergence  and  refer  him  to  Kelley  [2],  pages  69,  73,  and  74. 

This  completes  the  proof'  of  Theorem  4.1,  and  hence  having  determined 
the  exceptional  sets,  there  is  precisely  one  topology  on  6  and  one 
extension  to  T  which  gives  us  the  convergence  asserted. 

5.  Appendix,  Quotient  Spaces  and  Projections.  Let  ft  be  the  equivalence 
relation  on  ■£ : 

fftg  whenever  f(x)  =  g(x) 
except  on  a  set  of  p  measure  zero. 

That  is ,  two  densities  are  R-equivalent  when  they  are  equivalent  forms 
of  the  Radon-Nikodym  derivative  of  the  same  probability  measure. 
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Let  P  be  the  projection  of  £  onto  -E/ft.  Then,  with  the  quotient 
topology  P  is  continuous  and  hence  we  have 

Lemma  5.1  The  space  £/R  is  compact  when  £  is. 

Corollary  5.2  If  £  is  a  compactification  of  £  whose  elements  are 
densities  of  probability  measures ,  then  £/ft  is  a  compactification  of 
£/R. 


Appendix  6.  On  the  Conditions  1-3.  The  following  theorem  concerning 
Condition  2  is  well  known. 

Theorem  6.1  Let  f  be  a  probability  density  such  that 
^  f(x)p(dx)  =  1,  and  £y  log  f(x)  <  +  ■». 

Let  g  be  any  probability  density  which  is  not  R-equi valent  to  f, 

then 

(i)  eF  log  g(x)  <  &F  log  f(x) 

Proof  Let  f  have  support  S ,  then 

(ii)  £r  los  -  105  6f  flfr  =  los  {  frfy  f(x)P(d*)  =  o. 

o 

Moreover 

«“>  er  -  frfy  <  ^ 

unless  the  real  valued  random  variable  g(x)/f(x)  is  almost  surely  equal 
to  some  constant  c.  But,  c  <  1  since  J*  f(x)p(dx)  =  1  and  J*  g(x)p(dx)  <  1. 
Now,  if  c  were  equal  to  1,  then  g  is  equal  to  f  almost  everywhere  on 
the  support  of  f,  and  therefore  g  must  be  R-equivalent  to  f,  which  is  a 
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contradiction.  Hence  c  <  1,  and 
eF  log  o  <  0 

as  was  to  be  shown. 

Some  remarks  on  Condition  1  may  be  helpful. 

In  the  sequel,  let  (0,®,^)  be  the  real  line,  Borel  field,  and 

Lebesque  measure.  If  0  is  a  class  of  absolutely  continuous  distribution 

functions,  then  given  F  in  0,  there  exists  a  set  B^,  of  y  measure 

zero  such  that  off  B„  the  formal  derivative  exists  and  is  unique.  Thus, 

F 

if  £  is  a  class  of  formal  derivatives  of  elements  of  0  and  the  exceptional 
set  of  an  element  f  in  £  is  the  forementioned  set  depending  on 

the  corresponding  distribution  function,  we  have  a  natural  and  pleasing 
topology  on  £.  It  turns  out  (Theorem  6.6)  that  projection  into  the  quotient 
space  £/R  is  aclosed  map,-  and  the  natural  map  from  £/R  into  the  topological 
space  0,  (with  the  topology  of  convergence  in  distribution)  is  a  homeo- 
morphism.  Thus  consistency  of  the  derivative  (in  our  topology)  is  equivalent 
to  the  consistency  of  the  corresponding  estimate  of  the  distribution  function 
(with  the  topology  of  convergence  in  distribution). 

This  leads  us  to  the  following  definition:  A  topological  space  X 
of  densities  is  well  defined  in  case 

(i)  The  exceptional  set  of  a  density  in  X  depends  only  on  the 
corresponding  measure;  that  is,  given  a  probability  measure 
all  versions  of  the  Radon-Nikodym  derivative  which  are  in  X 
have  the  same  exceptional  set ,  call  it  B^,. 

(ii)  Given  a  probability  measure  F  all  versions  of  the  Radon  Nikodym 
derivative  of  F  which  are  in  X  are  equal  off  the  exceptional 

F. 


set  B. 
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It  should  be  pointed  out  that  although  the  space  £  may  be  well 
defined,  it  is  not  in  general  true  that  an  otherwise  suitable  compacti- 
fication  X  of  £  will  be  well  defined. 

For  example ,  if  £  is  the  class  of  unimodal  derivatives  of  probability 
measures  on  (-=°,  +“)  with  the  topology  of  convergence  at  points  of  conti¬ 
nuity,  then  any  suitable  compactification  of  £  will  contain  densities 
which  are  not  formal  derivatives. 

On  the  other  hand,  if  £  is  the  class  of  all  nonincreasing  derivatives 
of  probability  measures  on  [0,  <*>)  ,  then  £  is  compact  if  we  include 
measures  with  mass  less  than  one.  Hence  X  can  be  taken  to  be  well  defined, 
and  £  and  X  have  the  topology  of  convergence  at  points  of  continuity. 

In  the  sequel,  we  assume  that  X  is  a  compact  well  defined  space  of 
probability  densities.  We  let  cp  be  the  natural  map  of  an  element  f  of 
X  into  F(x)  =  J  f(t)dt,  and  let  £7  be  the  image  under  cp  of  X,  with 

—CO 

the  topology  of  convergence  in  distribution.  Since  the  continuous  image  of  a 
compact  space  is  compact,  and  a  subset  of  a  compact  space  is  compact  if  and 
only  if  it  is  closed,  we  have: 

Theorem  6.2  cp  is  a  closed  map. 

Now,  an  element  of  £7  is  uniquely  determined  if  we  know  its  value  on 
some  fixed  countable  dense  subset,  in  fact  we  have  the  well  known 

Theorem  6.3  &  is  homeomorphic  to  a  subspace  of  the  cube  of  dimension 

u>  (the  first  infinite  ordinal). 

Hence,  &  is  locally  separable  (in  fact  has  a  countable  base),  and 
we  need  consider  only  sequences. 


There  is  a  natural  one  to  one  map  (call  it  h)  from  0  to  £/R. 

We  can  give  £/ft  a  topology  (call  it  C  )  such  that  h  is  a  homeo- 
morphism.  It  follows  that  the  map  h(cp(*))  is  a  closed  continuous  map 
from  the  topological  space  £  to  the  space  (X/R,C^),  and  therefore 
([2],  page  95) is  precisely  the  quotient  topology  on  £/R.  Hence 

Theorem  6.4  The  projection  of  £  onto  £/R  is  a  closed  map,  and 
the  natural  map  from  £/R  to  £7  is  a  homeomorphism. 

Going  back  to  the  original  space  £  we  easily  have 

Theorem  6.5  (Condition  1)  £/R  is  a  locally  separable, locally  compact 

Hausdorff  space,  and  is  homeomorphic  to  cp(£) . 

Condition  3  on  the  supremum  function  is  not  so  easily  analyzed.  In 
all  of  the  cases  that  have  come  to  our  attention,  the  supremum  may  be  taken 
over  a  countable  class  of  measurable  functions,  and  as  such  is  measurable. 

The  integrability  of  this  function  may  be  assured  by  assuming  that  the  class 
of  densities  is  uniformly  bounded  above  by  some  constant.  If  the  resulting 
estimate  does  not  depend  on  the  constant,  then  we  may  conclude,  by  a  limiting 
argument,  that  the  estimate  is  consistent  without  the  condition  of  boundedness. 
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